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ABSTRACT 


Frontal  techniques  offer  the  potential  for  processing 


the  assembly  and  the  factorization  phases  of  finite  element 

iU'.'fe*"*. 

analysis  in  parallel^  However,  the  rows  of  the  stiffness 
matrix  are  assembled  and  factored  in  different  orders,  thus 


depriving  frontal  solvers  of  the  uniformity  desired  in 
parallel  processing.  On  the  other  hand,  band  solution  tech¬ 
niques  handle  the  factorization  phase  in  a  very  uniform  way 
but  do  not  interleave  assembly  and  factorization.  this 
paper,  we  suggest^a  technique  that  borrows  from  both  frontal 
and  band  solvers  those  characteristics  that  are  advantageous 
for  parallel  processing.  Moreover,  book  keeping  and  data 
manipulation  are  simpler  in  the  suggested  technique  than  in 
the  classical  frontal  method.  This  makes  the  suggested 


technique  also  attractive  for  sequential  systems. 
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1.  INTRODUCTION 


The  frontal  solution  technique  [5]  is  a  very  effective 
means  for  reducing  the  computer  time  and  the  storage 
requirement  for  finite  element  analysis.  Its  central  concept 
is  the  alternation  between  the  assembly  of  the  stiffness 
matrix  H  and  its  factorization.  In  order  to  be  more 

specific,  let  the  elements  in  the  finite  element  grid  be 
labeled  by  unique  integers  l,...,m,  and  processed  in  that 
order.  That  is  the  corresponding  element  matrices 

H1  ,  •••,  Hm  are  accumulated  in  the  global  matrix  H  in  the 
given  order.  Also  let  the  nodes  in  the  grid  be  numbered  by 
the  integers  l,**«,n.  If  d  is  the  degree  of  freedom  at 
each  node,  then  we  may  associate  node  i  with  the  rows  (i- 
l)d+l,  •  •  • ,  id  of  H  .  In  this  paper,  we  will  simplify  the  dis¬ 
cussion  by  assuming  that  d  -  1.  However,  it  is  easy  to  see 
that  the  results  apply  to  the  case  d*l  as  well. 

After  the  processing  of  an  element  e  (the  accumulation 

of  He  into  H)  and  before  the  processing  of  element  e  +  1,  we 
may  define  the  following  two  sets  of  rows  of  H 

1)  The  set  of  partially  assembled  rows  (i  i  i  e  1  u---u  e 
AND  i  c  e  +  1  u •  •  •  u  m},  where  ice  denotes  that  i  is  a  node 
in  element  e  . 

2)  The  set  of  ready  rows  {i  }  i  e  e  AND  i  t  e+1  u  •••  u  m}. 
Any  row  in  this  set  will  not  be  modified  by  the  processing 

of  future  elements.  The  union  of  the  above  two  sets  is 
called  the  active  front  at  element  e  and  is  denoted  by 


A  frontal  solver  identifies,  after  the  processing  of 
each  element  e,  the  rows  in  F  (e)  that  are  ready  and  uses 
each  ready  row  i  of  H  to  eliminate  the  sub-diagonal  ele¬ 
ments  in  column  i  and  then  removes  the  ready  rows  from  core 
mesiory.  Hence,  if  |F  (e) |  is  the  cardinality  of  F  (e),  and 

F  -  max{|Pa(e)i  :  e=l,***,m),  then  the  frontal  solver  needs 

to  provide  core  storage  for  only  F  rows  of  H . 

For  large  problems,  frontal  solvers  have  two  advantages 
over  band  solvers,  namely  1)  they  interleave  the  assembly 
and  factorization  of  H,  and  hence  eliminate  the  need  to 
store  H  in  secondary  storage  during  the  assembly  and  then 
to  retrieve  it  during  the  factorization,  2)  They  require 

less  core  memory  than  band  solvers  because  F*  i3  usually 
smaller  than  the  bandwidth  of  the  matrix  H  [2,4].  However, 

F  (e)  consists  of  non-cont iguous  rows  of  H,  which  requires 

cL 

some  indexing  to  keep  track  of  the  location  of  each  row  m 
memory.  Also,  some  preprocessing  is  needed  in  order  to 
determine  the  instant  at  which  each  row  becomes  ready  (com¬ 
pletely  assembled) . 


Figure  1 


Clearly,  the  rows  of  H  do  not  become  ready  in  sequential 
order,  which  is  a  serious  problem  if  the  assembly  and  fac¬ 
torization  are  to  be  executed  in  parallel  on  different 
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hardware  units  (Fig  1) .  For  example  if  an  array  processor 
is  used  for  the  factorization,  then  indexing  becomes  a 
source  of  inefficiency.  The  same  applies  if  different  pro¬ 
cessors  are  used  for  the  assembly  and  the  factorization  in  a 
multiprocessor  system.  Moreover  all  the  special  purpose 
hardware  that  are  suggested  in  the  literature  for  matrix 
factorization  expect  to  receive  the  rows  of  H  in  a  sequen¬ 
tial  order  (see  for  e.g.  (6}  and  (7]  ). 

In  this  paper,  we  suggest  a  variation  of  the  frontal 
technique  that  does  release  the  ready  rows  of  H  in  order . 
This  variation  has  the  added  advantage  that  the  instant  at 
which  each  row  is  to  be  released  to  the  factorizer  is 
uniquely  determined  by  a  parameter  0,  that  is  no  preprocess¬ 
ing  is  needed  to  generate  information  about  the  instant  at 
which  each  row  becomes  ready. 


The  size  of  the  core  memory  needed  by  the  assembler  is 
proportional  to  the  value  of  the  parameter  0,  which,  then, 
has  to  be  chosen  in  an  optimal  way.  In  Section  3,  we 
describe  an  algorithm  for  the  determination  of  5_.  ,  the 
optimal  0  for  a  given  problem,  then,  in  Section  4,  we  res¬ 
trict  our  attention  to  the  sub-class  of  finite  element  grids 
that  are  commonly  used  in  practical  applications,  and  we 


derive  an  upper  bound  on  &min  for  this 


sub-class . 


Vj 


2.  An  order  preserving  frontal  technique 

In  the  rest  of  this  paper  we  will  not  distinguish 
between  the  application  of  the  frontal  technique  to  conven¬ 
tional  or  parallel  architectures.  More  specifically,  we 
will  use  the  tern  "a  row  is  consumed"  to  indicate  that  the 
row  is  factorized  'in  core'  in  conventional  computers,  or 
that  the  row  is  passed  to  the  factorization  unit,  in  the 
case  of  parallel  processing. 

Let  X(e)  =  max{i  |  i  e  F  (e)}.  That  is,  immediately 

cL 

after  the  assembly  of  element  e,  X(e)  is  the  row  with  the 
largest  index  in  the  active  front.  Also,  let  0  be  an 
integer  such  that,  after  the  processing  of  any  element  e, 
rows  l,***,X(e)  -  0  are  ready  (completely  assembled),  and 
hence,  may  be  consumed. 

The  basic  idea  in  our  modified  frontal  technique  is  to 
find  the  minimum  value  of  the  integer  0.  This  value  is 
called  the  width  of  the  delayed  front  and  is  denoted  by 
0  .  .  Given  0  ,  ,  the  assembly  of  H  and  its  factorization 
may  be  interleaved  as  described  by  the  following  algorithm: 

Last-consumed  :»  0  ; 

Por  elements  e-1,  ...  ,  m  do 

[]  Assemble  He  into  H  and  determine  X(e)  ; 

[  ]  If  e  <  m , 

Then  consume  rows  Last-consumed  ,  ...  ,  X(e)-0 

min 

Else  consume  rows  Last-consumed  ,  ...  ,  n  ; 

(]  Last-consumed  X(e)-0  .  ; 

min 


,%y.v,v.  ** 

^*.k.  ^Sali 
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Clearly,  after  any  iteration  e,  rows  X(e)- 

.  +1, • • • , X(e )  of  H  are  not  consumed  and  have  to  be  stored 
min 

in  memory.  A  circular  buffer  of  size  6min  may  be  used  to 
store  these  rows  in  sequential  order.  Here,  note  that  even 
if  a  row  p,  X(e)-0_.  <  p  <  X(e).  is  ready  after  the  process- 

ing  of  element  e,  its  consumption  is  delayed  until  after 

the  processing  of  an  element  e  with  X(e)-Omin  *  P •  This  is 
different  from  the  classical  frontal  technique  where  rows 
are  consumed  as  soon  as  they  are  ready. 

The  purpose  of  delaying  the  consumption  of  the  ready 
rows  of  H  is  two  folds:  1)  to  ensure  that  the  rows  of  H  are 
consumed  in  sequential  order  and  2)  to  allow  for  an 
automatic  determination  of  the  instant  at  which  each  row  is 
to  be  consumed.  The  price  to  be  paid  is  a  larger  memory 
requirement.  However,  with  today’s  technology,  this  price 
is  affordable  as  long  as  reasonable  bounds  may  be  imposed  on 
the  width  of  the  delayed  front  °min-  Such  bounds  will  be 
discussed  in  Section  A. 

In  any  frontal  technique,  the  order  at  which  the  ele¬ 
ments  are  processed  is  crucial  because  it  determines  the 
size  of  the  active  front.  Hence,  an  element  numbering  is 
first  chosen  to  minimize  the  active  front,  then  the  nodes 
are  numbered  according  to  their  occurrences  in  the  elements. 
More  specifically,  given  an  element  numbering,  the  following 
algorithm  is  usually  used  to  number  the  nodes: 


AUGl 


last-number  0  ; 

Por  elements  e  *  1 , • • • , m  Do 

1)  Por  each  node  v  in  e  that  is  not  numbered  yet  Do 

1.1)  last-number  :*  last-number  +  1  ; 

1.2)  Give  v  the  number  last-number  ; 

This  type  of  two  phase  node  numbering  scheme  has  been 
studied  in  [3]  where  it  is  shown  that  if  the  elements  are 
numbered  using  the  reverse  Cuthill-Mckee  algorithm  [1]  then 
the  profile,  bandwidth  and  anticipated  fill-in  of  the  matrix 
H  resulting  from  the  two  phase  node  numbering  are  compar¬ 
able  to  those  resulting  from  the  best  known  heuristic  node 
numbering  scheme,  namely  the  reverse  Cuthill-Mckee  algo¬ 
rithm. 

If  ALG1  is  used  to  number  the  nodes  in  the  grid,  then 
the  width  of  the  delayed  front  0min  may  be  easily  determined 
provided  that  the  element  numbering  is  proper  in  the  sense 
of  the  following  definitions: 

Definition  1:  Given  a  specific  numbering  of  the  elements,  am 
element  e,  1  <  e  <  m  is  called  "wrapped”  if  any  node  in  e  is 
also  in  one  of  the  previous  elements  l,***,e-l. 

Definition  2:  An  element  numbering  is  called  "proper”  if  it 
does  not  result  in  any  wrapped  element.  That  is,  it  satis¬ 
fies  the  property  that  any  element  e,  1  <  e  <  m,  contains  at 
least  one  node  that  is  not  in  elements  l,***,e-l. 
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Proposition  1:  Given  a  finite  element  grid  and  a  proper  ele¬ 
ment  numbering,  let  the  nodes  in  the  grid  be  numbered  by 

ALG1,  and  let  the  ha  If -bandwidth  of  the  resulting  matrix  H 

X 

be  B.  -  max  {  j  g  (e  )-s  (e  )  i  } ,  where  g(e)  and  s(e)  are  the 
e  *1 , • • , m 

largest  and  smallest  node  numbers,  respectively,  in  element 
e.  If  each  element  contains  at  most  k  nodes,  then 

Bk  -  k  <  0  .  <  B. 

n  min  n 

Proof:  Consider  the  situation  after  the  processing  of  any 
element  e.  Prom  the  proper  numbering,  any  element  r,  r  >  e, 
contains  at  least  one  node  not  in  elements  l,*--,e.  Hence, 
g(r)  >  g(e),  where  g(r)  and  g(e)  are  the  largest  node 
numbers  in  elements  r  and  e,  respectively.  But,  from  the 
definition  of  B^ ,  the  smallest  node  number  in  element  r 
satisfies  s(r)  >  g(r)-Bh  >  g(e)-Bh>  That  is  rows  1,  •••, 

g(e)-Bh  are  not  affected  by  the  assembly  of  element  r.  Not¬ 
ing  that  this  is  valid  for  any  r  >  e  and  that  X(e)  -  g(e)  in 
a  proper  labeling,  we  conclude  that  • 

Next,  let  e  be  the  specific  element  that  satisfies 

Bh  -  g(e)  -  s(e),  and  let  e  **  e-t.  Clearly,  at  most  k-1. 
nodes  may  be  numbered  in  element  e,  that  is  g(e)  <  g(e)+k. 


Now,  after  the  processing  of  element  e  ,  rows  1 


g  (e  )  - 


(B^  -  k)  are  not  completely  assembled  because  element  e  >  e 
contains  a  node  s(e)  *  g(e)  -  B.  <  g  (e )  k  -  B.  .  Hence,  row 


s(e)  will  be  affected  by  the  assembly  of  element  e,  which 
proves  that  ®m^n  >  B^  -  k ■ 


(a)  Proper  labeling  (b)  Non-proper  labelin*. 

Figure  l 

Proposition  1  states  that  if  we  chose  fi  **  ,  then  we 

will  be  away  from  the  optimal  0min  by  at  most  k.  In  Figure 
2. a  and  2.b,  we  give  a  proper  and  a  non-proper  element 
numbering,  respectively,  for  the  same  grid  (element  numbers 
are  enclosed  in  circles).  The  node  numbering  resulting  from 


ALC1  is  also 

shown. 

In 

both  cases 

Bh  -  9,  but 

in  the  non 

proper  case. 

after 

the 

assembly  of 

element  9, 

row  g  ( 9  )  - 

Bh~20-9*11  is  not  completely  assembled  and  will  be  affected 

by  the  assembly  of  element  10.  In  other  words,  5  >  B,  , 

1  1  min  h 

which  proves  that  proper  element  labeling  is  essential  for 
the  result  of  Proposition  1  to  hold. 

If  the  elements  in  the  grid  are  of  the  Lagrangian  type 
with  k  >  4,  then  each  element  contains  at  least  one  center 
node,  and  hence,  any  element  labeling  is  proper.  However, 
general  conditions  for  the  existence  of  proper  labeling  are 
hard  to  obtain.  In  Figure  3,  we  show  two  grids  for  which  no 
proper  element  labeling  exist.  The  choice  of  0  in  such 
cases  is  discussed  in  the  next  section. 


3.  Chosing  0  for  general  element  labeling 


Consider  the  triangular  grid  of  Fig  3.b.  Clearly,  the 
elements  that  are  hashed  in  the  figure  are  wrapped,  and 
hence  the  given  element  labeling  is  not  proper.  The 


corresponding 

node 

number ing 

(shown  also 

in 

the  figure) 

yields 

Bh  ~  7 

.  For  this  numbering,  row 

16 

is  partially 

summed 

after 

the 

assembly 

of  element 

30, 

and  hence 

0  .  >  g(30)-16  »  26-16  *  10.  That  is,  some  criteria  other 

mm 

than  0  *  Bh  should  be  used  if  the  element  labeling  is  not 


proper . 


(a) 


Figure  3  -  Grids  that  do  not  have  proper  labeling 

The  method  that  we  suggest  for  the  choice  of  6  is  based 
on  the  idea  of  augmenting  the  given  grid  with  dummy  nodes  so 
that  no  elements  are  wrapped.  Each  dummy  node  is  given  the 
same  number  as  the  last  numbered  node  and  the  band  width, 
B  ,  corresponding  to  the  augmented  grid  is  computed.  Then  0 

ft 

is  taken  to  be  equal  to  8  .  For  example,  applying  this 

ft 
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procedure  to  the  grid  of  Fig  3.b  gives  the  grid  of  Fig  4, 
from  which  the  augmented  band-width  is  found  to  be  10. 


Figure  4  -  The  grid  of  Fig  3. a  augmented  with  dummy  nodes 

Note  that  the  augmented  grid  does  not  have  to  be  con¬ 
structed  in  order  to  compute  B^  .  It  suffices  to  keep  track, 
while  numbering  the  nodes  in  ALG1,  of  the  last  node  that  has 
been  numbered.  More  specifically,  we  may  modify  ALG1  to 
compute  B  as  follows: 

A 


KLG2 

last-number  0  ;  B  -  0  ; 

For  elements  e  -  l,***,m  Do 

1)  Por  each  node  v  in  e  that  is  not  numbered  yet  Do 

1.1)  last-number  last-number  +  1  ; 

1.2)  Give  v  the  number  last-number  ; 

2 )  g  ( e  )  :  -  last-number  ; 

Find  s(e)  ;  the  smallest  node  number  in  e  ; 

If  (Ba  <  g(e)  -  s(e))  Then  Bft  g(e)  -  s(e)  ; 


V  '  •  ’  »  •  •  .*  / 


r.  v.  .  .  w  .  -  .  •  .  •  *  .  * 
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It  is  easy  to  check  that  if  the  element  numbering  is 
proper,  then  B&  computed  in  ALG2  reduces  to  ,  the  half- 
bandwidth  of  the  matrix  H  . 


Proposition  2:  For  a  general  element  numbering,  if  B  >  B 

then  0  .  ”  B_  +  1,  else,  if  B  -  B.  ,  then 

min  a  an 


B  -  k  <  0  .  <  B 

a  mm  a 


where  k  is  the  number  of  nodes  in  each  element. 


Proof:  Consider  the  situation  after  the  assembly  of  a  partic¬ 
ular  element  e .  For  any  element  r  >  e ,  ALG2  implies  that 
any  node  number  v  in  r  satisfies  v  >  g(e)  -  B  ,  where  g(e) 
is  the  largest  node  number  in  e .  This  is  valid  for  any 
r  >  e,  and  hence  rows  l,***,g(e)-B  -1  are  completely  assem- 

at 

bled  and  0  .  <  B  +1. 

mxn  a 

Now,  for  Ba  -  Bh  the  lower  bound  is  proved  as  in  Proposition 
1.  This  bound  may  be  tightened  if  Ba  >  B^ ,  because  this 

implies  that  there  exists  an  element  e  such  that  a  dummy 

node,  say  n,  was  added  to  e  and  B  -  m  ~  s(e’).  But  from 

oL 

ALG2,  there  exists  an  element  e  <  e  such  that  g(e)  -  n. 
Hence,  after  the  processing  of  element  e,  rows  l,***,g(e)-B 

cl 

are  not  completely  assembled,  because  row  s(e)Kg(e)  -  H _ 

will  be  modified  during  the  assembly  of  element  e.  This 

implies  that  0_.  >  B„  ■ 

mm  a 

Proposition  2  shows  that  the  choice  of  0  «■  B  +1  is 

A 

optimal  if  B  >  B.  and  is  away  from  the  optimal  by  at  most  k 


The  next  queetion  to  be  asked  is:  How  large  cam  B  be 
compared  to  B^?.  This  is  importamt  because  it  determines  the 
maucimum  storage  needed  by  the  assembler .  However ,  it  seems 
impossible  to  obtain  any  bound  on  B  if  complete  freedom  is 
allowed  in  the  construction  of  the  grid  amd  in  the  numbering 
of  its  elements.  For  this  reason,  we  define  in  the  next 
section  the  class  of  W-proper  element  numbering  that 
excludes  arbitrary  stramge  grids  amd  numberings. 
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4.  Upper  bounds  on  B_  for  W-proper  element  numbering 

The  results  reported  by  Law  and  Penves  [3]  suggest  the 
use  of  the  Cuthill-Mckee  (CM)  algorithm  or  its  reverse  for 
numbering  the  elements  in  two  phase  node  numbering  schemes. 

In  their  paper,  the  following  definition  of  adjacency  is 
used: 

Definition  3:  Two  elements  in  a  finite  element  grid  are  called 
adjacent  if  they  share  a  common  edge. 

With  this  definition,  the  CM  algorithm  may  be  described 
as  follows: 

ALG3  -  The  Cuthill-Mckee  algorithm 

i  *  0  ;  LevelfO]  -  {  a  specific  starting  element  }  ; 

Repeat  until  all  elements  are  numbered 
i  «  i  +  1  ? 

Consider  the  elements  in  Level[i-1]  in  order  of 
ascending  numbering.  For  each  element  e,  determine 
the  elements  that  are  adjacent  to  e,  number  them  and 
include  them  in  Level ( i] 


The  specification  of  the  scheme  used  to  number  the  ele¬ 
ments  does  not  exclude  grids  with  arbitrary  strange  shapes. 
The  following  definition  imposes  some  regularity  on  both  the 
grid  topology  and  the  numbering  scheme: 

Definition  4:  An  element  numbering  is  called  W-proper,  if  each 

wrapped  element  e  shares  a  node  v  with  an  adjacent  element 
e,  such  that  v  is  not  in  elements  l,***,e-l.  Note  that  this 
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implies  that  e  <  e  and  that  e  is  not  wrapped. 

More  descriptively,  if  ALGl  is  used  to  number  the  nodes 

in  the  grid,  then  each  element  e  that  does  not  contain  a 
new  node  should  contain  at  least  one  node  v  that  have  been 
numbered  in  am  adjacent  element  e .  For  example,  the 
numberings  in  Figure  3  are  W-proper,  while  the  CM  numbering 
of  the  grid  shown  in  Figure  5  is  not  W-proper.  More  specif¬ 
ically,  in  Fig  5,  element  25  is  wrapped  and  its  only  adja¬ 
cent  element,  naunely  24,  is  also  wrapped.  For  this  example, 
it  is  easy  to  see  that  B^-10  and  Ba«22. 
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Figure  5  -  A  non  W-proper  element  numbering 

As  is  clear  from  the  above  example,  non  W-proper  number¬ 
ing  may  be  obtained  only  on  very  strange  grids.  Moreover, 
the  following  may  be  proved: 

Proposition  3:  If  the  elements  in  the  grid  are  of  the  serendi¬ 
pity  type,  that  is  contain  nodes  on  the  edges  of  the  ele¬ 
ments  in  addition  to  those  on  the  corners,  then  amy  element 
numbering  of  the  grid  is  W-proper. 


£■ 
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Proof:  Let  e  and  e  be  two  adjacent  elements,  and  let  e  >  e 


Prom  the  hypothesis  of  the  proposition,  there  is  a  node  v  on 


the  common  edge  of  e  and  e  so  that  v  cannot  be  in  any  other 


element  in  the  grid.  Hence  v  is  not  in  elements  l,***,e-l 


and  hence  e  cannot  be  wrapped! 


Now,  we  are  ready  to  establish  a  bound  on  B  . 

cl 


Proposition  4:  If  the  CM  algorithm  is  used  to  number  the  ele¬ 


ments  of  a  grid,  and  the  resulting  numbering  is  W-proper, 


then 


B.  «  2  Bh 


Proof:  Let  e  be  the  element  that  satisfies 


B  -  ft,  -  s  ( e )  »  g  (u )  -  s(e) 

A 


where  a  is  the  dummy  node  added  to  e  and  u  <  e  is  the  first 


element  before  e  that  is  not  wrapped. 


Prom  the  hypothesis,  there  is  an  element  e  adjacent  to  e 


such  that  e  <  e,  and  there  exists  a  node  v  in  both  e  and  e 


such  that  v  is  not  in  elements  l,***,e-l.  By  the  definition 


of  Bh, 


v  -  s  (e )  <  B. 


Given  that  LevelfO]  in  CM  algorithm  contains  only  one  node, 


then  u  is  in  a  level  i  >  1  and  there  exists  an  element  u  in 


level  i-1  such  that  u  is  adjacent  to  u .  In  other  words. 


there  is  a  node  X  in  both  u  and  u  that  satisfies 

g(u)  -  X  <  Bh  (3) 

X  4  g  (u )  (4) 

Moreover,  u  <  e  because  if  u  >  e  then  the  CM  algorithm 

would  not  number  u  before  e.  Hence,  any  node  not  in  ele¬ 
ments  l,***,e-l  has  a  number  larger  than  g(u).  In  particu¬ 
lar 

v  >  g  (u  )  (5) 

From  (2),  (3),  (4)  and  (5)  we  get 

g(u)  -  s  (e )  <  2  Bh 

The  result  then,  follows  directly  from  (!)■ 
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5.  Conclusion 

We  presented  a  method  for  interleaving  the  assembly  and 
solution  stages  of  the  finite  element  analysis.  This  method 
differs  from  the  classical  frontal  technique  in  the  follow¬ 
ing: 


1)  The  rows  of  the  assembled  matrix  are  made  available  for 
factorization  in  order.  This  is  important  for  parallel 
processing. 


2)  The  instant  at  which  each  row  is  made  available  is 
determined  automatically,  rather  than  through  elaborate 
preprocessing. 


3)  The  storage  required  by  the  assembler  is  determined  by 
the  width  of  the  delayed  front  0min  rather  than  the  max¬ 


imum  size  of  the  active  front  F .  Although  0  .  is  usu- 

min 

ally  larger  than  ?,  we  proved  that,  for  the  type  of 
meshes  encountered  in  practical  applications, 
°min  ^  ^  Bh  <  w*'ere  B  is  the  bandwidth  of  the  stiff¬ 
ness  matr ix  H . 


4)  The  rows  of  H  are  stored  in  order.  Hence,  no  indexing 
is  needed  to  keep  track  of  the  location  of  each  row  in 
memory. 

In  other  wotds,  features  from  both  frontal  and  band 
solvers  are  combined  in  a  method  that  is  easy  to  implement 
on  either  parallel  or  uniprocessor  systems. 
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